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Background, Enteroaggregative Escherichia coli (EAEC) is a cause of epidemic and sporadic diarrhea, yet its role 
as an enteric pathogen is not fully understood. 

Methods, We characterized 121 EAEC strains isolated in 2008 as part of a case-control study of moderate to 
severe acute diarrhea among children 0-59 months of age in Bamako, Mali. We applied multiplex polymerase chain 
reaction and comparative genome hybridization to identify potential virulence factors among the EAEC strains, 
coupled with classification and regression tree modeling to reveal combinations of factors most strongly associated 
with illness. 

Results, The gene encoding the autotransporter protease Sep A, originally described in Shigella species, was most 
strongly associated with diarrhea among the EAEC strains tested (odds ratio, 5.6 [95% confidence interval, 1.92-16.17]; 
P = .0006). In addition, we identified 3 gene combinations correlated with diarrhea: (1) a clonal group positive for 
sepA and a putative hemolysin; (2) a group harboring the EAST-1 enterotoxin and the flagellar type H33 but no other 
previously identified EAEC virulence factor; and (3) a group carrying several of the typical EAEC virulence genes. 

Conclusion, Our data suggest that only a subset of EAEC strains are pathogenic in Mali and suggest that sepA 
may serve as a valuable marker for the most virulent isolates. 



It is estimated that diarrhea causes at least 1.5 million 
deaths annually, mostly in children <5 years of age [1]. 
Although in aggregate the diarrheagenic Escherichia coli 
(DEC) pathotypes comprise the most common bacterial 
pathogens worldwide [2], each DEC pathotype is clini- 
cally, epidemiological^, and pathogenetically distinct. 
For some pathotypes, the key virulence factors are 
known, at least in part, whereas for other pathotypes. 
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the key virulence genes and how they coordinately 
function in the setting of enteric disease remain elusive. 

The enteroaggregative E. coli (EAEC) pathotype has 
been implicated in travelers' diarrhea [3], in endemic 
diarrhea among children in both industrialized [4] and 
resource-poor countries [5], and in persistent diarrhea 
among individuals infected with human immunodefi- 
ciency virus. A recent outbreak of Shiga toxin-producing 
EAEC highlights its pathogenic potential [6]. Despite 
this, the molecular epidemiology of EAEC infection 
remains unclear, largely due to imperfect recognition 
of the true pathogenic factors within the broadly 
defined pathotype. 

Most EAEC strains colonize the intestinal mucosa via 
the aggregative adherence fimbriae (AAFs), which 
include at least 4 major antigenic variants [7-10]. AAFs 
are transcriptionally regulated by an AraC/XylS family 
activator called AggR [7, 11]. AggR is also required 
for expression of genes encoding dispersin (the aap 
gene), the Aat dispersin translocator [12], and the 
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Table 1. Primers Used for the 4 Multiplex Polymerase Chain Reactions (PCRs) and 3 Monoplex PCRs, Description of Target Gene, 
Product Size in Base Pairs, Annealing Temperature, and Concentration of the Primers 



Annealing 
Tennperature 











PGR 


Prinner 


GenBank 


Multiplex 


Gene/ 






Product, 


Concentration 


Accession 


PGR 


Target 


Description of Target 


Prinner Sequence (5'- 3') 


bp 


(°C), pmol/^iL 


No. 


1 


astA 


EAST-1 heat-stable toxin 


ATGCCATCAACACAGTATAT [22] 


110 


58/20 


L1 1241 


GCGAGTGACGGCTTTGTAGT [22] 


f 


pet 


Plasm id-encoded toxin 


GGCACAGAATAAAGGGGTGTTT [23] 


302 


58/25 


AF056581 


CCTCTTGTTTCCACGACATAC [23] 




sigA 


IgA protease-like homolog 


CCGACTTCTCACTTTCTCCCG [19] 


430 


58/30 


NC_004337 


CCATCCAGCTGCATAGTGTTTG [19] 




pic 


Serine protease precursor 


ACTGGATCTTAAGGCTCAGGAT [23] 


572 


58/25 


AF097644 


GACTTAATGTCACTGTTCAGCG [23] 




sepA 


Sliigella extracellular protease 


GCAGTGGAAATATGATGCGGC [23] 


794 


58/25 


Z48219 


TTGTTCAGATCGGAGAAGAACG [23] 


f 


sat 


Secreted autotransporter toxin [15] 


TCAGAAGCTCAGCGAATCATTG [19] 


932 


58/25 


AE014075 


CCATTATCACCAGTAAAACGCACC [19] 


2 


0RF3 


Cryptic protein^ 


CAGCAACCATCG CATTTCTA 


121 


57/35 




CGCATCTTTCAATACCTCCA 




aap 


Dispersin, antiaggregation protein [12] 


GGACCCGTCCCAATGTATAA^ 


250 


57/25 


Z32523 


CCATTCGGTTAGAGCACGAT^ 




aaiC 


AaiC, secreted protein 


TGGTGACTACTTTGATGGACATTGT^ 


313 


57/25 




G AC ACTCTCTTCTG G G GTAAACG A^ 




aggR 


Transcriptional activator 


GCAATCAGATTAARCAGCGATACA^ 


426 


57/25 


Z18751 


CATTCTTG ATTG CATAAG G ATCTG G ^ 




aatA 


Dispersin transporter protein 


CAG ACTCTG G C R AAAG ACTGTATC AT^ 


642 


57/35 


AY351860 


CAGCTAATAATGTATAGAAATCCGCTGT^ 


3 


agg4A 


AAF/IV fimbrial subunit 


TG AGTTGTG G G G CTAYCTG G A^ 


169 


57/25 


EU637023 


CACCATAAG CCG CC AAATAAG 




aggA 


AAF/I fimbrial subunit 


TCTATCT R G G G G G G CTAAC G CT^ 


220 


57/20 


Y18149 
AY344586 


ACCTGTTCCCCATAACCAGACC^ 




aafA 


AAF/II fimbrial subunit 


CTACTTTATTATCAAGTGGAGCCGCTA^ 


289 


57/25 


AF012835 


GGAGAGGCCAGAGTGAATCCTG^ 




agg3A 


AAF/III fimbrial subunit 


CC AGTTATTAC AG G GTAAC AAG G G AA^ 


370 


57/25 


AF41 1 067 


TTGGTCTGGAATAACAACTTGAACG^ 




aggmC" 


Usher, AAF/III-IV assembly unit 


TTCTCAGTTAACTGGACACGCAAT^ 


409 


57/35 


AF41 1 067 
AB255435 
EU637023 


TTAATTG GTTAC G C AATC G C AAT^ 


TCTGACCAAATGTTATACCTTCAYTATG^ 




aafC 


Usher, AAF/II assembly unit 


AC AG CCTG C G G TC AAAAG C ^ 


491 


57/25 


AF1 14828 


GCTTACG G GTACG AGTTTTACG G ^ 


4 


0RF61 


Plasmid-encoded hemolysin^ 


AG CTCTG G AAACTG G C CTCT 


108 


57/10 




AACCGTCCTGATTTCTGCTT 




eilA 


Salmonella HilA homolog 


AGGTCTGGAGCGCGAGTGTT^ 


248 


57/30 




1 GTAAAACGGTATCCACGACC^ 




capU 


Hexosyltransferase homolog 


CAG G CTGTTG CTC AAATG AA^ 


395 


57/25 


AF 134403 


[ GTTCGACATCCTTCCTGCTC^ 




air 


Enteroaggregative immunoglobulin 
repeat protein [24] 


TTATCCTGGTCTGTCTCAAT 


600 


57/25 




G GTTAAATCG CTG GTTTCTT 
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Table 1 continued. 



Annealing 
Tennperature 



Multiplex 
PGR 


Gene/ 
Target 


Description of Target 


Prinner Sequence (5'- 3') 


PGR 
Product, 
bp 


Prinner 
Goncentration 
(°G), pmol/^iL 


GenBank 
Accession 
No. 


Singleplex 
PGR 


espY2 


Non-LEE-encoded type III secreted 
effector 


GG GAAAAG ATGG G G AAAATA^ 


216 


59/25 


EGSP_0073 


f TG AG G ATTG GTG AG GTG AAG^ 




rmoA 


Putative hemolysin expression- 
modulating protein 


TTAGGTTAGATATTTGGATATG^ 


210 


60/25 


EGUMN_0072 


I CG AAAAGAAAAG AG G AATG G ^ 






shiA-Wke inflammation suppressor^ 


CAGAATGGGGGGGGTAAGGG [25] 


292 


57/25 


EGB_03517 


GAGTGAAGGGTGGGTGATGATGGGGG [25] 



Abbreviations: bp, base pair; PGR, polymerase chain reaction. 

^ Unpublished. 

^ Designed for this study. 

^ Two forward primers and 1 reverse primer were used for the amplification of agg3/4C. This primer set was designed to amplify the usher gene from both AAF/III 
and IV, hence the name. 

^ Primers used to amplify the shiA gene were forward primer from sisA gene and reverse primer from sisB gene, as described by Lloyd et al [25]. 



chromosomal cluster termed Aai, encoding a type VI secretion 

system [13]. Factors not under AggR control include the Air ad- 

hesin, a regulator termed EilA, the EAEC heat-stable toxin EAST- 
1 (encoded by the astA gene), and a set of toxins termed the serine 
protease autotransporters of Enterobacteriaceae (SPATEs). 

SPATES have been organized phylogenetically into 2 classes. 
Members of class 1 are cytotoxic to epithelial cells [14]; class 1 
SPATES found in EAEC strains include the plasmid-encoded 
toxin (Pet) and its 2 homologs, Sat [15] and SigA [16]. The class 2, 
or noncytotoxic, SPATEs include Pic, a mucinase that promotes 
intestinal colonization [17, 18]. As with cytotoxic SPATEs and 
Pic, we have recently reported that the class 2 SPATE SepA is 
found commonly among EAEC strains [19]. SepA is a cryptic 
protease originally described in Shigella species, and is reported 
to contribute to intestinal inflammation [20]. Importantly, none 
of these factors are found in all EAEC isolates, and no single 
factor has ever been consistently implicated in E7\EC virulence. 

Here, we characterize 121 EAEC strains isolated as part of 
a case-control study of acute moderate to severe diarrhea among 
children aged 0-59 months in IVLali. We report that the sepA 
gene and flagellar type H33 are strongly associated with illness, 
and we define additional sets of virulence genes and factors that 
are important in this population. 

MATERIALS AND METHODS 

Study Design 

The strains utilized were isolated in the course of a prospective 
multicenter case-control study (Global Enteric IMulti-Center 
Study, GEIVIS) of moderate to severe diarrhea among children 
<5 years of age. Full details of the GEIvlS design wiU be published 



elsewhere. In brief, children <59 months presenting to health 
centers for care with a complaint of diarrhea within the previous 
7 days were considered eligible. Cases were enrolled upon parental 
consent if they met criteria for moderate to severe diarrhea 
comprising signs of moderate to severe dehydration (sunken eyes, 
decreased skin turgor), dysentery (blood in stool), or if they were 
deemed to require hospitalization or intravenous rehydration. 
Diarrhea was defined as the passage of >3 or more unformed 
stools within a 24-hour period. A stool sample was obtained at 
enrollment and analyzed comprehensively for bacterial, viral, and 
protozoal agents. An age-matched asymptomatic control from the 
same neighborhood was enrolled for each case; a stool sample was 
obtained from the control child and analyzed similarly. 

Specimen Processing and Microbiological Analysis 

A single, fresh, whole stool specimen was collected from cases 
and controls at enrollment for the recovery of potential en- 
teropathogens. Various specific growth media were used for 
detecting the bacterial pathogens. Up to 3 colonies with the 
appearance of E. coli on JMacConkey agar were selected from 
each sample and tested using multiplex polymerase chain 
reaction (PCR) for enterotoxigenic E coli (ETEC) (heat-labile 
[LT] and heat-stable [ST] enterotoxins), enteropathogenic E. 
coli (EPEC) (eae and bfpA), and EAEC (aaiC and aatA). Any 
colonies that were positive for either aaiC (chromosomaUy 
encoded) or aatA (encoded on the pAA plasmid) were considered 
EAEC for the purposes of this analysis. 

Serotyping 

Somatic (O) and flagella (H) antigens were identified as de- 
scribed elsewhere [21]; the following designations were included: 
"O rough," the boiled culture auto -agglutinated, suggesting 



Genomic Characterization of EAEC • JID 2012:205 (1 February) • 433 



absence of O antigen; "O?," it could not be determined whether 
the strain produces an O antigen (precipitation with Cetavlon 
indicates an acidic polysaccharide that could represent capsular 
K antigen); and "0 + ," the O antigen is present but could not be 
typed. Serotyping was performed at the International Escherichia 
and Klebsiella Centre (World Health Organization), Department 
of Microbiological Surveillance and Research, Statens Serum 
Institut, Copenhagen, Denmark. 

Polymerase Chain Reaction 

Primers and conditions for detecting sequences encoding 21 
putative virulence genes, which are described in Table 1, were 
used in 4 multiplex reactions. Multiplex PCR 1 was performed as 
previously described [ 19] , with the addition of primers targeting 
astA. Multiplexes 2-4 were performed using PCR mastermix 
(2X) according to the manufacturer's instructions (Fermentas 
International), with the addition of 1 \xL 50 mM magnesium 
chloride per 50 |iL reaction. A DNA template was prepared by 
boiling a suspension of 10 isolated colonies in 200 |iL distilled 
water. PCR reaction cycles were as follows: (1) 2 minutes 
denaturation at 95°C, (2) 50 seconds denaturation at 94°C, 
(3) annealing for 1.5 minutes, and (4) extension for 1.5 minutes 
at 72°C with 35 cycles returning to step 2. The final extension 
was 10 minutes at 72°C. Products were amplified using an 
Eppendorf Mastercycler Gradient thermal cycler (Eppendorf 
North America) and separated in 2% agarose gels. 

Individual amplification reactions to detect genes designated 
rmoA, espY2, and shiA were done in 25 |iL reaction volumes 
using crude bacterial cell lysates; PCR reactions were performed 
as multiplex 2-4. The final extension was 10 minutes at 72°C. 

The phylogenetic groups A, Bl, B2, and D were determined 
using triplex PCR methods employing phylogenetic group - 
specific primers for 2 genes, chuA and yjaA, and a cryptic DNA 
fragment, TspE4C2. The grouping was coupled to a dichotomous 
decision tree according to Clermont et al [26]. 

The following strains where used as controls for detection of 
target genes: JM221 (aggA, sat) [27], 042 {aatA, aggR, aaiC, aap, 
ORF3, pic, pet, astA, aafA, aafC, air, capU, eilA) [28], 55989 
{agg3A, agg3/4C) [8], H223-1 {sigA) [29], ClOlO-00 {agg4A, 
agg3/4C, sat, sepA) [30], MC1061 (negative control), J96 {chuA, 
yjaA) [26], CFT073 (chuA, yjaA, TspE4.C2) [31], C452-97 
(TspE4.C2) [32], and EDL933 (chuA) [33]. 

Genomic Hybridization 

Comparative genome hybridization (CGH) was performed on 
aU the sepA-positive strains as well as sepA-negative strains 
C801-09 and C46-10 and reference EAEC isolates as previously 
described [34]. The pan-genome microarrays used in this study 
were designed by FDA-ECSG Array Probe Set Design and rep- 
resent the genomes of 32 diverse E. coli and Shigella species, as 
well as 46 enteric plasmid sequences [35]. Initial data analysis 
was performed with the Gene Chip Operating System suite of 
tools provided by Affymetrix. Additional analysis was performed 



using the Affymetrix power tools software. The MAS5 algorithm 
was utilized with the perfect match and mismatch calculations 
and a Tau of 0.150 to detect which probes were present or 
absent. Features that were present or absent in aU samples were 
removed from further analysis. The resulting features, known as 
the variable gene set, were analyzed using Multiple Experiment 
Viewer, version 4.5. The cladogram was constructed using the 
12 673 variable features in this dataset, which contained hy- 
bridization data from 36 strains. The relationship was determined 
using hierarchical clustering with Pearson correlation, using the 
absolute distance and complete linkage run with 500 bootstrap 
calculations. 

Statistics 

We utilized classification and regression tree (CART) Pro 
Version 6.0 (Salford Systems) software inputting 21 or 24 
factors of interest as binary (present/absent) independent 
predictive variables along with a continuous "factor total" 
that was a sum of all factors including flagellum type 
H33. Case/control status was the binary dependent outcome 
variable. 

RESULTS 

Initial Characterization of EAEC Strains 

After 1 year of surveillance, EAEC strains were isolated as the 
sole DEC pathogen from 60 children with diarrhea and 61 
asymptomatic controls. The lack of association of EAEC with 
diarrhea among the cases persisted even when the presence of 
other potential pathogens, stratifying for age, was considered 
or when either aatA or aaiC alone or in combination were 
considered. 

One EAEC isolate was selected from each stool sample that 
yielded EAEC by multiplex PCR. The 121 EAEC strains be- 
longed to diverse serotypes (Table 2; cases listed in Table 2 A and 
controls listed in Table 2B). Examination of the correlation 
between serotype and case/control status revealed that only 
flagellum type H33 was significantly more common among 
cases than controls (12 cases, 2 controls; odds ratio [OR], 5.9; 
P = .0138) (Table 3). EAEC cases and control strains were 
localized to similar positions within a previously published 
general E. coli phylogenetic tree (Tables 2 and 3). 

Frequencies of Virulence-Related Genes 

In order to assess the roles of putative virulence factors in 
EAEC epidemiology, we developed 4 multiplex PCR assays 
for the characterization of 21 genes previously found in EAEC 
strains. The results of the PCR assays for all strains are listed 
in Table 2. Of the 21 genes scored, hypothetical ORF3 was the 
most frequently detected (86%) followed by eilA (85.1%), 
capU (81.8%), aap (71.9%), aggR (69.4%), and aatA (68.6%) 
(Table 3). There was a high degree of concordance of these 
genes, which has been demonstrated previously for the 
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Table 2a. Characteristics of 60 Enteroaggregative Escherichia co// Strains Isolated From Children With Diarrhea 
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Table 2b. Characteristics of 61 EAEC Strains From Asymptomatic Control Children. 




Black boxes represent positivity. The designations of all Malian strains are preceded by the letter "c" in the text. 
Abbreviation: EAEC, enteroaggregative Escherichia coli. 
^ Age in months. 

^ Virulence factor score is based on the number of genes 1 strain is positive for. 
^ Strains analyzed for other virulence genes by comparative genomic hybridization. 
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Table 3. Distribution of Enteroaggregative Escherichia co/i Virulence Factors in Cases and Controls 



EAEC 


CasGS (n = 60) 


Controls (n = 61) 


Total (N = 121) 




Risk EstimatG 






VirulGncG (jGhg 




No. (%) 




No. (%) 




No. (%) 


woas naiio 


[QRO/ pii 
[aO /o Ulj 


X 


P \/ali iQ 
r VdlUU 


aatA 


37 


(61.7) 


46 


(75.4) 


83 


(68.6) 


0.5 


[.24-1.15] 


2.7 


.10 


aggR 


38 


(63.3) 


46 


(75.4) 


84 


(69.4) 


0.6 


[.26-1.23] 


2.1 


.15 


aaiC 


32 


(53.3) 


26 


(42.6) 


58 


(47.9) 


1.5 


[.75-3.15] 


1.4 


.24 


aap 


39 


(65.0) 


48 


(78.7) 


87 


(71.9) 


0.5 


[.22-1.19] 


2.8 


.09 


0RF3 


49 


(81.7) 


55 


(90.2) 


104 


(86.0) 


0.5 


[.17-1.39] 


1.8 


.18 


sat 


24 


(40.0) 


33 


(54.1) 


57 


(47.1) 


0.6 


[.28-1.16] 


2.4 


.12 


sepA 


20 


(33.3) 


5 


(8.2) 


25 


(20.7) 


5.6 


[1.92-16.17] 


11.7 


.0006 


pic 


29 


(48.3) 


27 


(44.3) 


56 


(46.3) 


1.2 


[.58-2.41] 


0.2 


.66 


sigA 


8 


(13.3) 


7 


(11.5) 


15 


(12.4) 


1.2 


[.40-3.51] 


0.1 


.76 


pet 


4 


(6.7) 


6 


(9.8) 


10 


(8.3) 


0.7 






.74 


astA 


32 


(53.3) 


30 


(49.2) 


62 


(51.2) 


1.2 


[.58-2.41] 


0.2 


.65 


aafC 


5 


(8.3) 


5 


(8.2) 


10 


(8.3) 


1.0 






>.999 


agg3/4C 


42 


(70.0) 


40 


(65.6) 


82 


(67.8) 


1.2 


[.57-2.63] 


0.3 


.60 


aggSA 


1 


(1.7) 


5 


(8.2) 


6 


(5.0) 


0.2 






.21 


aafA 


3 


(5.0) 


3 


(4.9) 


6 


(5.0) 


1.0 






>.999 


aggA 


11 


(18.3) 


21 


(34.4) 


32 


(26.4) 


0.4 


[.18-.99] 


4.0 


.04 


agg4A 


5 


(8.3) 


1 


(1.6) 


6 


(5.0) 


5.5 






.11 


air 


20 


(33.3) 


29 


(47.5) 


49 


(40.5) 


0.6 


[.26-1.15] 


2.5 


.11 


capU 


48 


(80.0) 


51 


(83.6) 


99 


(81.8) 


0.8 


[.31-1.99] 


0.3 


.61 


eilA 


50 


(83.3) 


53 


(86.9) 


103 


(85.1) 


0.8 


[.28-2.09] 


0.3 


.58 


0RF61 


28 


(46.7) 


44 


(72.1) 


72 


(59.5) 


0.3 


[.16-0.72] 


8.1 


.004 


espY2 


13 


(21.6) 


20 


(32.8) 


33 


(27.3) 


0.6 


[.25-1.28] 


1.9 


.17 


rmoA 


30 


(50.0) 


23 


(37.7) 


53 


(43.8) 


1.7 


[.80-3.39] 


1.9 


.17 


shiA 


21 


(35) 


22 


(36.1) 


43 


(35.5) 


0.5 


[.45-2.01] 


0.01 


.92 


EAEC SGrogroup 


099 


5 


(8.3) 


2 


(3.3) 


7 


(5.8) 


2.7 






.27 


0153 


6 


(10.0) 


1 


(1.6) 


7 


(5.8) 


6.7 






.61 


H- 


7 


(11.7) 


17 


(27.9) 


24 


(19.9) 


0.3 


[.12-.98] 


4.9 


.04 


[ H5 


6 


(10.0) 


2 


(3.3) 


8 


(6.6) 


3.3 






.16 


H9 


5 


(8.3) 


1 


(1.6) 


6 


(5.0) 


5.5 






.21 


f H30 


9 


(15.0) 


4 


(6.6) 


13 


(10.7) 


2.5 


[.73-8.66] 


2.2 


.13 


H33 


10 


(16.7) 


2 


(3.3) 


12 


(9.9) 


5.9 


[1.23-28.19] 


6.7 


.01 


PhylogGnGtic Group 


A 


23 


(38.3) 


14 


(22.9) 


37 


(32.2) 


2.1 


[.95-4.61] 


3.4 


.07 


[ B1 


15 


(25) 


13 


(21.3) 


28 


(23.1) 


1.2 


[.53-2.78] 


0.2 


.63 


B2 


8 


(13.3) 


9 


(14.7) 


17 


(14) 


0.9 


[.32-2.48] 


0.05 


.82 


D 


14 


(23.3) 


25 


(40.9) 


39 


(32.2) 


0.4 


[.19-.96] 


4.3 


.04 


P < .05 is significant. Fisiner exact test was applied wlien tine comparisons between cases and controls were <5 observations. 
Abbreviation: EAEC, enteroaggregative Escherichia coli. 


plasmid-encoded aap, aggR, and aatA genes [7, 12, 36-38]. 
Sixty- eight percent of the strains were positive for the usher- 
encoding gene agg3/4C (the ushers for AAF/III and AAF/IV 
variants are closely related). The most frequent AAF pilin gene 
was that of AAF/I, encoded by aggA (26.4%), followed by 
those of AAF/II (aa/A), AAF/III (agg3A), and AAF/IV iagg4A) 
at 5% each (Table 3). The agg4A gene was found more fre- 
quently among cases than controls (5 cases and 1 control), 
although this difference did not reach statistical significance. 


A total of 71 strains (58.7%) were negative for a known AAF 
variant. 

Of the 5 genes encoding SPATEs, the most frequent were sat 
(47.1%) and pic (46.3%). The least common SPATEs were pet 
(8.3%) and sigA (12.4%). The sepA gene was found in 25 strains 
(20.7%): 20 from cases and 5 from controls, yielding an OR of 
5.6 (P = .0006) (Table 3). Among all the putative virulence 
factors scored, sepA was the only one significantly associated 
with moderate to severe diarrheal iUness. 
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Figure 1. Classification and regression tree (CART) classification tree topology reveals combinations of factors most strongly associated with moderate to 
severe diarrhea. We considered all genotypic and phenotypic assays performed: aatA, aggH, aaiC, aap, 0RF3, sat, sepA, pic, sigA, pet, astA, aafC, agg3/4C, 
aafA, aggSA, aggA, agg4A, air, capU, eilA, 0RF61 , virulence factor score (VSF), and flagellum type H33. Each branch of the CART tree ends in a terminal "node" 
(red boxes), and each terminal node is uniquely defined by the presence or absence of a predictive factor such as a gene or VPS. The tree is hierarchical in 
nature. C701-09, C718-0a C801-0a and C46-10 are also shown on the dendrogram. 



Significance of Combinations of EAEC Genes 

In addition to considering each factor individually, we pursued 
a number of approaches to consider the importance of combi- 
nations of potential EAEC virulence factors. When crudely 
considering the collective number of virulence loci present 
(generating a virulence factor score, VFS), the average number 
of virulence genes from cases was 8.75 versus 9.5 from control 
isolates. 

To consider combinations of factors, we employed CART 
analysis, which builds a model in stepwise fashion to yield the 



combination of factors most strongly associated with the que- 
ried outcome. Each branch of a CART output tree ends in 
a terminal "node"; each observation falls into exactly 1 terminal 
node; and each terminal node is uniquely defined by a set of 
rules, such as having or not having a certain factor. 

We considered all genotypic and phenotypic assays performed 
and interrogated the association with case status. Figure 1 
illustrates the best CART fit for the dataset. The analysis 
demonstrates that the presence of sepA, regardless of the 
presence or absence of any other scored genotype or phenotype 
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Figure 2. Cladogram of comparative genomic hybridization data of sepApositive isolates (C34-1G, C35-1G, C679-G9C, C682-G9, C693-G9, C7G1-G9, C7G3- 
G9, C716-G9, C718-G9, C719-G9, C729-G9, C736-G9, C745-G9, C764-G9, C765-G9, C769-G9, C771-G9, C778-G9, C783-G9, C796-G9, C697-G9, C7G7-G9, C71G-G9, 
C734-G9, and C748-G9), 5ep/l-negative isolates (C46-1 G and C8G1 -G9), and reference isolates (CI G96, G42, 55989, JM221 , 1 7-2, 34b, 1 G1 -1 , and HS). Notably, 
C46-1 G was most closely related to Mexican enteroaggregative Escherichia co// (EAEC) strain JM221 (isolated from an adult traveler to Guadalajara [27]), and 
strain C8G1-G9 was most closely related to EAEC strain G42, isolated from a child with diarrhea in Lima, Peru [28]. The phylogenetic comparison was 
performed using the 1 2 673 variable features of the 36 hybridizations included. The tree is built using a hierarchical clustering with Pearson correlation using 
both the absolute distance and complete linkage and viewed in FigTree (http://tree.bio.ed.ac.uk/software/figtree/). Isolates represented in black are 
reference isolates, controls are indicated in blue, and cases in red. The serotypes of the strains are indicated to the right. The gray boxes identify clusters of 
serotypes within the context of the larger tree, indicating that those serogroups are genomically similar. 



among tlie sepA-positive strains, provides a strong association 
witti diarrtiea. 

Among tlie sepA-negative strains, CART analysis suggested 2 
additional trait clusters that were associated with moderate to 
severe diarrheal illness: 1 cluster included those strains har- 
boring the flagellum H33 and the toxin EAST-1, whereas 
a second cluster lacked H33 but featured a VPS of 9, suggesting 
a combination of typical EAEC factors in addition to the Sat 
toxin. 

Genomic Analyses 

We hypothesized that the strain sets belonging to the nodes most 
strongly associated with diarrhea would reveal the presence of 
additional virulence determinants, which themselves might ex- 
plain the observed clinical correlations. We therefore performed 



CGH analysis using a previously described microarray con- 
taining the full genomes of 32 E. coli and Shigella strains and 
the genes of an additional 46 E. coli plasmids [35]. For this 
analysis, we chose all 25 sepA-positive strains, 2 additional strains 
(C46-10 and C801-09) representing CART (Figure 1) nodes 2 
(SepA absent, H33 present, EAST- 1 present) and 3 (SepA absent, 
H33 absent, >8.5VFS, Sat present), and a set of archetype EAEC 
reference strains. Standard cluster analysis was performed 
on the microarray data (Figure 2). All isolates belonging to 
a common serotype clustered together in this analysis. Although 
cluster analysis did not suggest genomic differences discrimi- 
nating cases and controls, the analysis did suggest that sepA- 
positive strains segregated into 2 major clusters (indicated as 
A and B in Figure 2). We chose for further genomic exami- 
nation archetypal strains representing s^pA -positive clusters A 
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Table 4. Comparative Genomic Hybridization of Strains C701-09, C718-09, C801-09, and C46-10 Against a Microarray That Comprises the 
Full Genomes of 32 Escherichia coli and Shigella Strains and the Genes of Additional 46 E. call Plasmids 









Hybridization by Genonne 






Putative Virulence Gene^ 


Accession No. 


Nonpathogenic^ 


C701-09 C7 18-09 C801-09 


C46-10 


Adhesins 


csgA; cryptic curlin major subunit^ 


SBO_2026 


+ 


+ + 


+ 


_ 


csgA; major curlin subunit^ 


LF82_0360 


+ 


+ 


+ 


_ 


csgC; putative autoagglutination protein^ 


ECUMN_1217 


+ 


+ + 


+ 


_ 


ecpD; putative chaperone protein EcpD^ 


SBO_0126 


+ 


+ 


+ 


_ 


Fimbrial usher family protein^ 


SbBS512_E2717 


+ 


+ + 


_ 


+ 


Flu; antigen 43 (Ag43)^ 


ECUMN_3400 


+ 


+ 


_ 


+ 


Hemagglutinin family'^ 


SbBS512_E4026 




+ 


+ 


_ 


Putative AidA-l adhesin-like protein 


EC026_3415 


_ 


_ _ 


+ 


+ 


Putative AidA-l adhesin-like protein^ 


EC026_1353 


+ 


_ _ 


+ 


_ 


Putative chaperone protein EcpD 


ECUMN_0137 


_ 


+ 


+ 


_ 


Putative fimbrial biogenesis outer membrane usher protein 


ECUMN_0019 


_ 


+ 


+ 


_ 


Putative fimbrial protein^ 


SbBS512_E2376 


+ 


+ + 


+ 


_ 


Putative fimbrial-like protein^ 


SD.Y_0915 


+ 


+ + 


+ 


_ 


Putative invasin^ 


EcSMS35_1146 


+ 


+ + 


+ 


_ 


Putative type 1 fimbrial protein 


ECSP_0022 


_ 


_ _ 


+ 


_ 


sfmD; putative outer membrane export usher protein SfmD^ 


ECO26_0565 


+ 


+ 


+ 


_ 


sfmF; putative fimbrial-like adhesin protein SfmF^ 


ECO26_0567 


-f- 


+ 


-t- 


_ 


sfmH; putative fimbrial-like adhesin protein^ 


ECUMN_0573 


+ 


+ 


+ 


_ 


siiEA; adhesin for cattle intestine colonization 


ECUMN_0527 


_ 


_ _ 


+ 


_ 


yfaL; adhesin YfaL^ 


EC026_3226 


+ 


+ + 


+ 




yfcP; putative fimbrial-like adhesin protein^ 


BWG_2107 


+ 


+ 






yfcQ; putative fimbrial-like adhesin protein^ 


BWG_2108 


+ 


+ 






yfcR; putative fimbrial-like adhesin protein^ 


BWG_2109 


+ 


+ 




+ 


yfcS; putative periplasmic pilus chaperone^ 


BWG_2110 


+ 


+ 


+ 


+ 


yfcS; putative periplasmic pilus exported chaperone^ 


ECUMN_2676 


+ 


+ 


+ 


+ 


yfcT; outer membrane export usher protein^ 


ECDH10B_2499 


+ 


+ 


+ 


+ 


yfcU; export usher protein^ 


ECDH10B_2500 


+ 


+ 


+ 


+ 


yfcU; outer membrane usher protein 


E2348C_2477 




+ 


+ 


+ 


yfcV; predicted fimbrialprotein-like protein 


E2348C_2478 




+ 


+ 




Toxins 


Hcp-like protein^ 


SSON_0233 


+ 


+ 


+ 




hlyE; hemolysin E^ 


EC026_1695 


+ 


+ + 


+ 




Secretion Systems 


espY2; Non-LEE-encoded Type III Secreted Effector 


ECSP_0073 






+ 




Hypothetical protein; type VI secretion system 
secreted protein VgrG*^ 


ECSP_0240 


+ 


+ 


+ 


+ 


Putative type II secretion protein (Gspl-like)^ 


ECIAI1_3105 


+ 


+ 


+ 




Putative type III secretion protein EpaR^ 


ECUMN_3195 


+ 




+ 




T3SS effector-like protein EspL-homolog^ 


EC0111_4829 


+ 


+ 


+ 




tolC; outer membrane channel protein^ 


SDY_3205 


+ 


+ + 




+ 


Type III secretion protein EpaQ*^ 


ECO26_3940 


+ 


+ 


+ 




Type III secretion protein EpaR^ 


ECO103_3428 


+ 


+ 


+ 




Type III secretion protein EprJ^ 


EC026_3933 


+ 


+ 


+ 




Other 


Hemolysin expression-modulating protein 


EC55989_3351 




+ + 


+ 




Putative hemolysin expression-modulating protein RmoA 


ECUMN_0072 




+ + 




+ 


Putative hemolysin co-regulated protein^ 


SSON_0255 


+ 




+ 


+ 


ShlA-like protein 


ECB_03517 








+ 



^ EAEC genes are listed in Table 2A and 2B. 

One hundred percent identities with HS and/or K12. 
^ Eighty-eight percent identities with HS. 

Fifty-five percent to 62% identities with HS and/or K12. 
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Figure 3. Classification and regression tree analysis described in Figure 1, adding the genes shiA, espy2, rmoA (hemolysin expression-modulation 
protein). See Figure 1 legend for details of analysis 



and B, as well as the 2 additional nodes (2 and 3) indicated by 
CART analysis (Figure 1). 

Genome Analysis of sep>1-Positive Strains 

To represent the sepA-positive strain cluster A (Figure 2), we 
chose strain C718-09 for further genome analysis; to represent 
cluster B, we chose strain C701-09. Results are presented in 
Table 4. Genome analysis of strain C-7 18-09 did not reveal the 
presence of additional genes that were not also carried by 
nonpathogenic E. coli strains. C701-09 hybridized to an open 
reading frame that was 99% identical to the rmoA gene encoded 
on plasmid RlOO (GenBank accession number Y13856.1); 
rmoA encodes a predicted 69 amino acid putative hemolysin 
expression-modulation protein that is 100% identical to protein 



RmoA found on plasmid RlOO from E. coli [39]. The protein 
sequence of RlOO RmoA exhibits 52% identity and 75% 
amino acid similarity with Hha protein from E. coli K12 
(GenBank accession number NP_414993). 

Genome Analysis of sep>l-Negative Strains 

Strains C46-10 and C801-09 were representative of the 2 sepA- 
negative nodes that were associated with diarrhea in Figure 1. 

C46-10 Genome Analysis 

Strain C46-10 best represented the confluence of factors iden- 
tified in node 2 (Figure 1), characterized as sepA absent, H33 
present, and EAST-1 present. By CGH, strain C46-10 hybridized 
to a large number of genes found among DEC pathotypes and 
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Shigella speices (Table 4) and encodes a complete yfc gene 
cluster, which has been proposed to encode a novel usher- 
chaperone fimbrial adhesin [40]. C46-10 harbored elements of 
a type VI secretion system homologous to VgrG oi Agrohacterium. 
However, this component of the newly described type VI systems 
is also found among nonvirulent isolates and has not yet been 
assigned any virulence function among DEC or Shigella strains. 

C46-10 DNA was found to hybridize with the 347 amino acid 
ShiA-like protein from E. coli strain REL606 (GenBank accession 
number YP_003046696). The latter protein exhibited 97% 
identity with the ShiA protein initially described in Shigella 
flexneri 5a strain M90T (GenBank accession number AF 141 323) 
[41] . ShiA and related proteins identified in uropathogenic E coli 
and Shigella strains have been found to suppress the in- 
flammatory response in animal models [42]. 

C801-09 Genome Analysis 

C801-09 is closely related to the virulent archetype EAEC strain 
042 and harbors many of the same virulence genes, including 
a near-complete plasmid-borne AggR regulon. It represents the 
most common serotype found in our study (O153:H30). Like 
C701-09, C801-09 harbored homologs of a large number of 
adhesins, including the siiEA locus that is associated with col- 
onization of cattle by E. coli strain UMN026. C801-09 also 
harbored EspY2, a non-LEE-encoded type III secreted effector 
from E. coli 0157:H7 strain TW14359 (GenelD: 8214639). Five 
proteins, EspYl-5 from the E. coli 0157:H7 Sakai strain, possess 
an N-terminal WEX5F domain, which has been linked to type 
III secretion and is conserved in several well- characterized Sal- 
monella effectors and in putative effectors from Edwarsiella and 
Sodalis [43]. 

Screening of the EAEC Collection for Presence of espY2, rmoA, 
and shiA Genes 

Based on the CGH analysis from strains C46-10, C701-09, 
C718-09, and C801-09, we inferred that espY2, rmoA, and shiA 
were the factors most plausibly associated with virulence. Using 
PGR, we found that 35.5% of the EAEG strains from cases and 
a similar percentage from controls harbored the shiA gene and 
273% of each group harbored espY2. The rmoA gene was found 
in 43.8% of the EAEG strains (50% from cases and 37.7% from 
controls). None of the 3 genes were independently associated 
with diarrheal illness (Table 3). However, when we repeated the 
GART analysis including the espY2, rmoA, and shiA genes 
(Figure 3), sepA once again exhibited a strong association, yet 
strains that were both sepA- and rmoA-positive were most 
strongly associated with disease (13 out of 14 strains positive 
for this combination were present among cases). 

DISCUSSION 

EAEG is a common diarrheal isolate, yet apart from those 
outbreak-associated, identification of truly pathogenic strains 



remains difficult. A large number of virulence factors and 
combinations have been associated with clinical illness in 
epidemiologic studies, and it is possible that either the principal 
determinants of pathogenicity vary by site and population or 
that the true determinants have not yet been identified. 

We report the most detailed genomic characterization of 
EAEG performed to date, targeting a collection of 121 EAEG 
strains isolated from children in Mali with or without moderate 
to severe diarrhea. In agreement with previous reports [44-46], 
our strains belonged to a diverse range and combination of 0:H 
and phylogenetic types. Although no specific 0:H combination 
was associated with diarrhea, strains expressing the H33 flagellar 
antigen were found significantly more often in cases than in 
controls. This association may signify the existence of a specific 
set of virulence genes in strains of this H type. 

To profile the virulence genes of our strain set, we developed 
and applied 4 multiplex PGR assays targeting 21 putative viru- 
lence genes. We found our EAEG strains to be astonishingly 
diverse. The only factor associated individually with diarrhea in 
these analyses was the Shigella SPATE toxin SepA. Recognizing 
that pathogenicity represents the concerted action of multiple 
virulence factors, which can sort independently throughout the 
E. coli population, we assessed combinations of virulence factors 
using GART analysis. This analysis reinforced the association of 
sepA with diarrhea, independent of any of the other 20 genes 
scored. We then performed comprehensive genomic analyses 
on the s^pA-positive strains using GGH against a reference set 
of E. coli genomes. These studies identified the hemolysin 
expression-modulating protein RmoA as commonly present in 
combination with SepA and served to strengthen the association 
of SepA with clinical illness (Figure 3). Our data demonstrate the 
importance of strains encoding a combination of virulence 
factors (here SepA and RmoA), although additional factors 
may colocalize with these genes. 

Among the sepA-negative strains, GART analysis suggested 
2 combinations of factors that indicate virulent strains (Figure 1). 
GGH analysis of strains representative of these combinations 
(strains G801-09 and G46-10) revealed 2 additional factors: T3SS 
effector EspY2 and ShiA, the latter being associated with mod- 
ulation of the inflammatory response. However, screening the 
complete strain set for the presence of these 2 factors, followed 
by revised GART analysis, did not suggest that these 2 genes 
strengthened the association with moderate to severe diarrhea. 

The association of the toxin EAST-1 with diarrhea only oc- 
curred among strains that lacked the majority of the AggR 
regulon, suggesting that they may require virulence factors not 
yet apparent; these may occur predominantly in strains harbor- 
ing flagellar type H33 (Figure 1). EAST-1 -positive strains have 
previously been implicated in pediatric diarrhea [47], so these 
strains may warrant continued investigation. 

The terms typical EAEC and atypical EAEC have been sug- 
gested to refer to EAEG strains harboring or lacking AggR, 
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respectively. Some studies have demonstrated an association of 
typical EAEC with diarrhea [5, 48]. We did not observe any 
correlation of AggR regulon genes with moderate to severe 
illness in this study. It is possible that our focus on moderate to 
severe diarrhea overlooks mild illness due to EAEC and that 
true determinants of pathogenicity are not recognized. Alter- 
natively, illness may be obscured by epidemiologic factors, such 
as previous exposure. Also, we note that our EAEC definition 
included two AggR-related genes, potentially introducing strain 
selection bias. 

This study is notable for the association of the Shigella viru- 
lence factor SepA with clinical illness, an association that per- 
sisted when the effects of other pathogens were considered (OR, 
5.6; P = .0006; data not shown). SepA was first described by 
Benjelloun-Touimi et al [20] and is a prominent extracellular 
protein secreted by S. flexneri strains. SepA is produced during 
infection [20] and has been shown to confer increased epithelial 
cell exfoliation from human intestinal explants infected with 
S. flexneri [49] . We also note that SepA is produced by the Shiga 
toxin-producing outbreak strain from Germany in 2011. The 
previously unsuspected role of SepA in EAEC warrants further 
investigation. 

Leveraging a large epidemiologic study and powerful genomic 
techniques, our study sheds additional light on the complex 
nature of diarrheagenic E. coli genomes and their association 
with human disease. 
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